biomedical ontology
Bio-KGvec2go: Serving up-to-date Dynamic Biomedical Knowledge Graph Embeddings
Ahmad, Hamid, Paulheim, Heiko, Sousa, Rita T.
Knowledge graphs and ontologies represent entities and their relationships in a structured way, having gained significance in the development of modern AI applications. Integrating these semantic resources with machine learning models often relies on knowledge graph embedding models to transform graph data into numerical representations. Therefore, pre-trained models for popular knowledge graphs and ontologies are increasingly valuable, as they spare the need to retrain models for different tasks using the same data, thereby helping to democratize AI development and enabling sustainable computing. In this paper, we present Bio-KGvec2go, an extension of the KGvec2go Web API, designed to generate and serve knowledge graph embeddings for widely used biomedical ontologies. Given the dynamic nature of these ontologies, Bio-KGvec2go also supports regular updates aligned with ontology version releases. By offering up-to-date embeddings with minimal computational effort required from users, Bio-KGvec2go facilitates efficient and timely biomedical research.
From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPT
Hamed, Ahmed Abdeen, Lee, Byung Suk
The generative capabilities of LLM models present opportunities in accelerating tasks and concerns with the authenticity of the knowledge it produces. To address the concerns, we present a computational approach that systematically evaluates the factual accuracy of biomedical knowledge that an LLM model has been prompted to generate. Our approach encompasses two processes: the generation of disease-centric associations and the verification of them using the semantic knowledge of the biomedical ontologies. Using ChatGPT as the select LLM model, we designed a set of prompt-engineering processes to generate linkages between diseases, drugs, symptoms, and genes to establish grounds for assessments. Experimental results demonstrate high accuracy in identifying disease terms (88%-97%), drug names (90%-91%), and genetic information (88%-98%). The symptom term identification accuracy was notably lower (49%-61%), as verified against the DOID, ChEBI, SYMPTOM, and GO ontologies accordingly. The verification of associations reveals literature coverage rates of (89%-91%) among disease-drug and disease-gene associations. The low identification accuracy for symptom terms also contributed to the verification of symptom-related associations (49%-62%).
Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus
Hartendorp, Fons, Seinen, Tom, van Mulligen, Erik, Verberne, Suzan
Biomedical entity linking, a main component in automatic information extraction from health-related texts, plays a pivotal role in connecting textual entities (such as diseases, drugs and body parts mentioned by patients) to their corresponding concepts in a structured biomedical knowledge base. The task remains challenging despite recent developments in natural language processing. This paper presents the first evaluated biomedical entity linking model for the Dutch language. We use MedRoBERTa.nl as base model and perform second-phase pretraining through self-alignment on a Dutch biomedical ontology extracted from the UMLS and Dutch SNOMED. We derive a corpus from Wikipedia of ontology-linked Dutch biomedical entities in context and fine-tune our model on this dataset. We evaluate our model on the Dutch portion of the Mantra GSC-corpus and achieve 54.7% classification accuracy and 69.8% 1-distance accuracy. We then perform a case study on a collection of unlabeled, patient-support forum data and show that our model is hampered by the limited quality of the preceding entity recognition step. Manual evaluation of small sample indicates that of the correctly extracted entities, around 65% is linked to the correct concept in the ontology. Our results indicate that biomedical entity linking in a language other than English remains challenging, but our Dutch model can be used to for high-level analysis of patient-generated text.
Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations?
Zaitoun, Antonio, Sagi, Tomer, Wilk, Szymon, Peleg, Mor
Ontologies play a crucial role in organizing and representing knowledge. However, even current ontologies do not encompass all relevant concepts and relationships. Here, we explore the potential of large language models (LLM) to expand an existing ontology in a semi-automated fashion. We demonstrate our approach on the biomedical ontology SNOMED-CT utilizing semantic relation types from the widely used UMLS semantic network. We propose a method that uses conversational interactions with an LLM to analyze clinical practice guidelines (CPGs) and detect the relationships among the new medical concepts that are not present in SNOMED-CT. Our initial experimentation with the conversational prompts yielded promising preliminary results given a manually generated gold standard, directing our future potential improvements.
Benchmark datasets for biomedical knowledge graphs with negative statements
Sousa, Rita T., Silva, Sara, Pesquita, Catia
Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements. We present a collection of datasets for three relation prediction tasks - protein-protein interaction prediction, gene-disease association prediction and disease prediction - that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements. These datasets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, enriched with negative statements. We also generate knowledge graph embeddings for each dataset with two popular path-based methods and evaluate the performance in each task. The results show that the negative statements can improve the performance of knowledge graph embeddings.
Ontology and Cognitive Outcomes
Limbaugh, David, Kasmier, David, Rudnicki, Ronald, Llinas, James, Smith, Barry
Here we understand 'intelligence' as referring to items of knowledge collected for the sake of assessing and maintaining national security. The intelligence community (IC) of the United States (US) is a community of organizations that collaborate in collecting and processing intelligence for the US. The IC relies on human-machine-based analytic strategies that 1) access and integrate vast amounts of information from disparate sources, 2) continuously process this information, so that, 3) a maximally comprehensive understanding of world actors and their behaviors can be developed and updated. Herein we describe an approach to utilizing outcomes-based learning (OBL) to support these efforts that is based on an ontology of the cognitive processes performed by intelligence analysts.
Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning
Background: Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications, and its accuracy has a large impact on electronic health record analysis. The mining of medical concepts is complicated by the broad use of synonyms and nonstandard terms in medical documents. Objective: We present a machine learning model for concept recognition in large unstructured text, which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology. Methods: We present a neural dictionary model that can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called the Neural Concept Recognizer (NCR), uses a convolutional neural network to encode input phrases and then rank medical concepts based on the similarity in that space.
RDF2Vec-based Classification of Ontology Alignment Changes
Jurisch, Matthias, Igler, Bodo
When ontologies cover overlapping topics, the overlap can be represented using ontology alignments. These alignments need to be continuously adapted to changing ontologies. Especially for large ontologies this is a costly task often consisting of manual work. Finding changes that do not lead to an adaption of the alignment can potentially make this process significantly easier. This work presents an approach to finding these changes based on RDF embeddings and common classification techniques. To examine the feasibility of this approach, an evaluation on a real-world dataset is presented. In this evaluation, the best classifiers reached a precision of 0.8.